Chapter 5 LAGOSNE Water Quality Analysis
lagoslakes.org archives observations of lakes over time. Here we visualize lakes in Minnesota, Iowa, & Illinois geospatially using Leaflet. In this analysis we will look at Chlorophyll A & Secchi depth (meters). Chlorophyll A (micro grams / liter) is a proxy measure for the amount of algae in a water body. Secchi depth is a measure of water clarity.
5.1 Loading in data
5.1.1 Download and identify the site locus(latiutde longitude)
#Lagos download script
#lagosne_get(dest_folder = LAGOSNE:::lagos_path(),overwrite=T)
#Load in lagos
lagos <- lagosne_load()
#Grab the lake centroid info
lake_centers <- lagos$locus
# Make an sf object
spatial_lakes <- st_as_sf(lake_centers,coords=c('nhd_long','nhd_lat'),
crs=4326)
#Grab the water quality data
nutr <- lagos$epi_nutr
#Look at column names
#names(nutr)
# subset columns nutr to only keep key info that we want
clarity_only <- nutr %>%
dplyr::select(lagoslakeid,sampledate,chla,doc,secchi) %>%
mutate(sampledate = as.character(sampledate) %>% ymd(.))5.1.2 Filter sites with at least 200 observations
This is done to ensure we have enough data for a given lake to conduct analysis. This was arbitrarily chosen. See Why 200 observations for an analysis of the distribution of observations per lake.
#Look at the number of rows of data
#nrow(clarity_only)
chla_secchi <- clarity_only %>%
filter(!is.na(chla),
!is.na(secchi))
# How many observatiosn did we lose?
filteredObservations = nrow(clarity_only) - nrow(chla_secchi)
# Keep only the lakes with at least 200 observations of secchi and chla
chla_secchi_200 <- chla_secchi %>%
group_by(lagoslakeid) %>%
mutate(count = n()) %>%
filter(count > 200)
# Join water quality data to spatial data
spatial_200 <- inner_join(
spatial_lakes,chla_secchi_200 %>% distinct(lagoslakeid,.keep_all=T),
by='lagoslakeid'
)We lost 651095 observations because they were missing Secchi or Chlorophyll data.
5.2 Mean Chlorophyll A map
### Take the mean Chlorophyll A and Secchi by lake
mean_values_200 <- chla_secchi_200 %>%
# Take summary by lake id
group_by(lagoslakeid) %>%
# take mean chl_a per lake id
summarize(
mean_chla = mean(chla,na.rm=T),
mean_secchi=mean(secchi,na.rm=T)
) %>%
#Get rid of NAs
filter(
!is.na(mean_chla),
!is.na(mean_secchi)
) %>%
# Take the log base 10 of the mean_chl
mutate(log10_mean_chla = log10(mean_chla))
#Join datasets
mean_spatial <- inner_join(
spatial_lakes,
mean_values_200,
by='lagoslakeid'
)
#Make a map
mapview(mean_spatial,zcol='log10_mean_chla', layer.name = 'Mean Chlorophyll A Log 10')5.3 Correlation between Secchi Disk Depth and Chlorophyll A
Chlorophyll blocks light and obscures the secchi disk. As chlorophyll increases, Secchi depth decreases. Additionally algae are the primary producers of chlorophyll along with weeds. Lakes that have a higher nutrient load leads to more chlorophyll, these dissolved nutrients also tend to cloud waters and obscure secchi disks. Finally, deeper lakes tend to produce less photosynthetic algae due to increased mechanical mixing of different water layers.
Figure 5.1: Secchi depth vs Chlorophyll
5.4 Which states have the most data?
# Make a lagos spatial dataset that has the total number of counts per site.
# Get count for each lake
lago_summary = chla_secchi %>%
#slice(1:10000) %>%
group_by(lagoslakeid) %>%
summarize(
mean_chla = mean(chla,na.rm=T),
mean_secchi=mean(secchi,na.rm=T),
count=n()
)
## Join to lake location
lago_location_summary =
merge(
x = lago_summary,
y = lake_centers,
by = "lagoslakeid",
all.x = TRUE
) %>%
st_as_sf(coords=c('nhd_long','nhd_lat'),crs=4326)
# Show all points on the map
#mapview(lago_location_summary)
# join this point dataset to the us_boundaries data.
state_bounds = us_states() %>%
dplyr::select(state_name,state_abbr)
lago_location_summary_join_state =
st_join(
x = lago_location_summary,
y = state_bounds,
left = TRUE
)
lago_location_summary_join_state_200 = lago_location_summary_join_state %>%
filter(count > 200)
# Group by state and sum all the observations in that state and arrange that data from most to least total observations per state.
state_data = lago_location_summary_join_state %>%
group_by(state_name) %>%
summarize(
mean_chla = mean(mean_chla,na.rm=T),
mean_secchi=mean(mean_secchi,na.rm=T),
count=sum(count)
) %>%
arrange(desc(count))
# verify all observations totaled correctly **success**
#sum(state_data$count)
state_data_200 = lago_location_summary_join_state_200 %>%
group_by(state_name) %>%
summarize(
mean_chla = mean(mean_chla,na.rm=T),
mean_secchi=mean(mean_secchi,na.rm=T),
count=sum(count)
) %>%
arrange(desc(count))# map of where state with most values are
state_obs_count = st_join(
x = state_bounds,
y = state_data,
left = TRUE
)
# map of where state with most values are
state_obs_count_200 = st_join(
x = state_bounds,
y = state_data_200,
left = TRUE
)5.5 Why 200 observations
Histogram 5.2 shows the distribution of observations / lake. The 200 observations per lake is somewhat arbitrary. Looking at 10 to 50 observations per lake would capture more than the outlier lakes with an extreme amount of observation activity.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 5.2: Distribution of observations / lake
5.6 25 observations per lake
At 25 observations per lake this demonstrates 2 clear spatial relations for Secchi depth:
Increasing latitude is positively correlated with Secchi depth. This could be to decreased agricultural activity in the north.
Increased distance from population centers is positively correclated with Secchi depth. This could be to less human made pollution adding nutrient to the water.